17 research outputs found
CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
Background: The German cDNA Consortium has been cloning full length cDNAs and continued with their
exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA
resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs
from biological and experimental noise. To this end we have developed a new high-throughput analysis tool,
CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their
systematic annotation and application in functional genomics.
Results: CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis
of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking
repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify
cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs.
Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference
sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping
exons and the structural classification of cDNAs with respect to the reference set of splice variants.
The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known
protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon
cDNAs and 85 % of the multiple exon cDNAs.
The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like
EST-annotation, or to extend it by adding new classification rules and new organism databases as they become
available. We think that it is a very useful program for the annotation and research of unfinished genomes.
Conclusion: CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality
prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a
first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the
rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel
transcripts for new experiments.German Federal Ministry of Education and Research 01GR0101 and 01GR0420 and 01GR045
Rhomboid Protease Dynamics and Lipid Interactions
Intramembrane proteases, which cleave transmembrane
(TM) helices, participate in numerous biological
processes encompassing all branches of life.
Several crystallographic structures of Escherichia
coli GlpG rhomboid protease have been determined.
In order to understand GlpG dynamics and lipid interactions
in a native-like environment, we have examined
the molecular dynamics of wild-type and mutant
GlpG in different membrane environments. The irregular
shape and small hydrophobic thickness of the
protein cause significant bilayer deformations that
may be important for substrate entry into the active
site. Hydrogen-bond interactions with lipids are
paramount in protein orientation and dynamics.
Mutations in the unusual L1 loop cause changes in
protein dynamics and protein orientation that are
relayed to the His-Ser catalytic dyad. Similarly,mutations
in TM5 change the dynamics and structure of
the L1 loop. These results imply that the L1 loop
has an important regulatory role in proteolysis.National Institute
of General Medical Sciences (GM-74637
Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods
Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism
and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover
phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype)
in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals
aged 30–45 years. PGMRA involves biclustering genotype and lipidome data independently followed
by their inter-domain integration based on hypergeometric tests of the number of shared individuals.
Pathway enrichment analysis was performed on the SNP sets to identify their associated biological
processes. We identified 93 statistically significant (hypergeometric p-value < 0.01) lipidomegenotype
relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes.
Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs
and participants, thus representing most distinct subgroups. We identified 30 significantly enriched
biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome
subgroups through which the identified genetic variants can influence and regulate plasma lipid
related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the
studied Finnish population that may have distinct disease trajectories and therefore could be useful in
precision medicine research.Research Council of FinlandSocial Insurance Institution of FinlandCompetitive State Research Financing of Expert Responsibility area of Kuopio, Tampere and Turku University HospitalsJuho Vainio FoundationPaavo Nurmi FoundationFinnish Foundation for Cardiovascular ResearchFinnish Cultural Foundation
Finnish IT center for scienceSigrid Juselius FoundationTampere Tuberculosis FoundationEmil Aaltonen FoundationYrjo Jahnsson FoundationSigne and Ane Gyllenberg FoundationDiabetes Research Foundation of Finnish Diabetes Association 322098
286284
134309
126925
121584
124282
255381
256474
283115
319060
320297
314389
338395
330809
104821
129378
117797
141071
INFRAIA-2016-1-730897Horizon 2020European Research Council (ERC)
European Commission 349708Tampere University Hospital Supporting FoundationFinnish Society of Clinical ChemistrySpanish Government RTI2018-098983-B-100Laboratoriolaaketieteen Edistamissaatio~SrIda Montinin saatioKalle Kaiharin saatioAarne Koskelon saatioFaculty of Medicine and Health Technology, Tampere UniversityProject HPC-EUROPA3 X51001
50191928EC Research Innovation Action under H2020 Programme 75532
Optimization of multi-classifiers for computational biology: application to gene finding and expression
Genomes of many organisms have been
sequenced over the last few years. However, transforming
such raw sequence data into knowledge remains a hard
task. A great number of prediction programs have been
developed to address part of this problem: the location of
genes along a genome and their expression. We propose a
multi-objective methodology to combine state-of-the-art
algorithms into an aggregation scheme in order to obtain
optimal methods’ aggregations. The results obtained show
a major improvement in sensitivity when our methodology
is compared to the performance of individual methods for
gene finding and gene expression problems. The methodology
proposed here is an automatic method generator, and a
step forward to exploit all already existing methods, by
providing alternative optimal methods’ aggregations to
answer concrete queries for a certain biological problem
with a maximized accuracy of the prediction. As more
approaches are integrated for each of the presented problems,
de novo accuracy can be expected to improve further.Ministry of Science and Innovation, Spain (MICINN)
Spanish Government TIN-2006-12879Junta de Andalucia TIC-02788Howard Hughes Medical InstituteEuropean Commission
Junta de Andaluci
Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics
Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome- wide computational analysis of its intergenic regions. Comparative sequence data from eight related alpha-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5 '-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of alpha-proteobacteria with their eukaryotic hosts.Spanish Ministerio de
Educación y Ciencia (Project AGL2006-12466/AGR)Junta de Andalucía (Project CV1-01522)NIH Grant
1R01GM070538-02FPI Fellowship
from the Spanish Ministerio de Educación y Cienci
Temperament & Character account for brain functional connectivity at rest: A diathesis-stress model of functional dysregulation in psychosis
The online version contains supplementary material
available at https://doi.org/10.1038/s41380-023-02039-6The human brain’s resting-state functional connectivity (rsFC) provides stable trait-like measures of differences in the perceptual,
cognitive, emotional, and social functioning of individuals. The rsFC of the prefrontal cortex is hypothesized to mediate a person’s
rational self-government, as is also measured by personality, so we tested whether its connectivity networks account for
vulnerability to psychosis and related personality configurations. Young adults were recruited as outpatients or controls from the
same communities around psychiatric clinics. Healthy controls (n = 30) and clinically stable outpatients with bipolar disorder
(n = 35) or schizophrenia (n = 27) were diagnosed by structured interviews, and then were assessed with standardized protocols of
the Human Connectome Project. Data-driven clustering identified five groups of patients with distinct patterns of rsFC regardless of
diagnosis. These groups were distinguished by rsFC networks that regulate specific biopsychosocial aspects of psychosis: sensory
hypersensitivity, negative emotional balance, impaired attentional control, avolition, and social mistrust. The rsFc group differences
were validated by independent measures of white matter microstructure, personality, and clinical features not used to identify the
subjects. We confirmed that each connectivity group was organized by differential collaborative interactions among six prefrontal
and eight other automatically-coactivated networks. The temperament and character traits of the members of these groups
strongly accounted for the differences in rsFC between groups, indicating that configurations of rsFC are internal representations of
personality organization. These representations involve weakly self-regulated emotional drives of fear, irrational desire, and
mistrust, which predispose to psychopathology. However, stable outpatients with different diagnoses (bipolar or schizophrenic
psychoses) were highly similar in rsFC and personality. This supports a diathesis-stress model in which different complex adaptive
systems regulate predisposition (which is similar in stable outpatients despite diagnosis) and stress-induced clinical dysfunction
(which differs by diagnosis).EU FEDER grants through the Spanish Ministry of Science and Technology
PID2021-125017OB-I00,
RTI2018-098983-B-I00,
D43 TW011793-06A1,
PID2021-125017OB-I00,
RTI2018-098983-B-I00,
D43 TW011793-06A1United States Department of Health & Human Services
National Institutes of Health (NIH) - USA
R01-MH124060Psychosis-Risk Outcomes Network
U01 MH12463
Gene network downstream plant stress response modulated by peroxisomal H2O2
Reactive oxygen species (ROS) act as secondary messengers that can be sensed
by specific redox-sensitive proteins responsible for the activation of signal
transduction culminating in altered gene expression. The subcellular site, in
which modifications in the ROS/oxidation state occur, can also act as a specific
cellular redox network signal. The chemical identity of ROS and their subcellular
origin is actually a specific imprint on the transcriptome response. In recent
years, a number of transcriptomic studies related to altered ROS metabolism in
plant peroxisomes have been carried out. In this study, we conducted a metaanalysis
of these transcriptomic findings to identify common transcriptional
footprints for plant peroxisomal-dependent signaling at early and later time
points. These footprints highlight the regulation of various metabolic pathways
and gene families, which are also found in plant responses to several abiotic
stresses. Major peroxisomal-dependent genes are associated with protein
and endoplasmic reticulum (ER) protection at later stages of stress while, at
earlier stages, these genes are related to hormone biosynthesis and signaling
regulation. Furthermore, in silico analyses allowed us to assign human orthologs
to some of the peroxisomal-dependent proteins, which are mainly associated
with different cancer pathologies. Peroxisomal footprints provide a valuable
resource for assessing and supporting key peroxisomal functions in cellular
metabolism under control and stress conditions across species.Spanish Ministry of Science, Innovation and Universities (MCIU)State Research Agency (AEI)FEDER grant PGC2018-098372-B-I00MCIU Research Personnel Training (FPI) grant BES-2016-07651
Evolution of genetic networks for human creativity
The genetic basis for the emergence of creativity in modern humans remains a mystery despite sequencing the genomes of
chimpanzees and Neanderthals, our closest hominid relatives. Data-driven methods allowed us to uncover networks of genes
distinguishing the three major systems of modern human personality and adaptability: emotional reactivity, self-control, and
self-awareness. Now we have identified which of these genes are present in chimpanzees and Neanderthals. We replicated
our findings in separate analyses of three high-coverage genomes of Neanderthals. We found that Neanderthals had nearly
the same genes for emotional reactivity as chimpanzees, and they were intermediate between modern humans and
chimpanzees in their numbers of genes for both self-control and self-awareness. 95% of the 267 genes we found only in
modern humans were not protein-coding, including many long-non-coding RNAs in the self-awareness network. These
genes may have arisen by positive selection for the characteristics of human well-being and behavioral modernity, including
creativity, prosocial behavior, and healthy longevity. The genes that cluster in association with those found only in modern
humans are over-expressed in brain regions involved in human self-awareness and creativity, including late-myelinating and
phylogenetically recent regions of neocortex for autobiographical memory in frontal, parietal, and temporal regions, as well
as related components of cortico-thalamo-ponto-cerebellar-cortical and cortico-striato-cortical loops. We conclude that
modern humans have more than 200 unique non-protein-coding genes regulating co-expression of many more proteincoding genes in coordinated networks that underlie their capacities for self-awareness, creativity, prosocial behavior, and
healthy longevity, which are not found in chimpanzees or Neanderthals
Analyzing gender disparities in STEAM: A Case Study from Bioinformatics Workshops in the University of Granada
La bioinformática es un área interdisciplinaria que ha despertado un gran interés tanto para el mundo académico como para las corporaciones en los últimos años. Esta área creciente combina conocimientos y habilidades de las áreas de biología y ciencia, tecnología, ingeniería, artes y matemáticas (STEM). Una de las ventajas de la sinergia entre estas dos áreas de trabajo es que ofrece una oportunidad para cerrar la brecha de género de STEM tradicional. A pesar de esta oportunidad y la importancia y amplia aplicación del campo de la bioinformática, este tema aún no ha ganado suficiente visibilidad en los programas de posgrado para los títulos de bachillerato en la Universidad de Granada. Esto ha motivado la organización de un "Taller educativo sobre bioinformática" anual en la Universidad de Granada por el Departamento de Ciencias de la Computación e Inteligencia Artificial. Los resultados del análisis de las dos primeras ediciones de este taller muestran un gran interés en el tema por la comunidad universitaria en todos los niveles (por ejemplo, estudiantes de pregrado y posgrado, docentes e investigadores) sin distinción significativa entre los géneros a nivel global. Al analizar el grupo de estudiantes, las mujeres mostraron un mayor interés en el tema. Sin embargo, este interés no se reflejó en los estratos universitarios superiores (docentes e investigadores), que representan un vistazo de la situación actual general española en el área.Bioinformatics is an interdisciplinary area that has raised a high interest for both academia and corporations in recent years. This rising area combines knowledge and skills from Bio and Science, Technology, Engineering, Arts and Mathematics (STEM) areas. One of the advantages of the synergy between these two work areas is that it offers an opportunity for closing the traditional STEM's gender gap. Despite this opportunity and the signi cance and wide application of bioinformatics eld, this topic has still not gained enough visibility in the graduate programs for the Bio Bachelor Degrees at the University of Granada. This has motivated the organization of an annual \Educational Workshop on Bioinformatics" at the University of Granada by the Department of Computer Science and Arti cial Intelligence. Results of the analysis of the rst two editions of this workshop show a great interest on the topic by the university community at all levels (e.g. undergraduate and graduate students, teachers and researchers) without signi cant distinction among genders at global level. When analyzing student group, women did show a higher interest on the subject. However, this interest was not reflected in the higher university strata (teachers and researchers), which represents a glimpse of the spanish general current situation on the area.Universidad de Granada: Departamento de Arquitectura y Tecnología de Computadore
Identification of novel prostate cancer genes in patients stratified by Gleason classification: Role of antitumoral genes
Spanish Ministry of Science and Innovation, Grant/Award Number: PRE2019-089807; Spanish Ministry of Science and Technology, Grant/Award Numbers: PI15/00914, RTI2018-098983-B-100; Universidad de Granada/CBUAProstate cancer (PCa) is a tumor with a great heterogeneity, both at a molecular and
clinical level. Despite its global good prognosis, cases can vary from indolent to lethal
metastatic and scientific efforts are aimed to discern those with worse outcomes. Current
prognostic markers, as Gleason score, fall short when it comes to distinguishing
these cases. Identification of new early biomarkers to enable a better PCa distinction
and classification remains a challenge. In order to identify new genes implicated in PCa
progression we conducted several differential gene expression analyses over paired
samples comparing primary PCa tissue against healthy prostatic tissue of PCa patients.
The results obtained show that this approach is a serious alternative to overcome
patient heterogeneity. We were able to identify 250 genes whose expression varies
along with tissue differentiation—healthy to tumor tissue, 161 of these genes are
described here for the first time to be related to PCa. The further manual curation of
these genes allowed to annotate 39 genes with antitumoral activity, 22 of them
described for the first time to be related to PCa proliferation and metastasis. These
findings could be replicated in different cohorts for most genes. Results obtained considering
paired differential expression, functional annotation and replication results
point to: CGREF1, UNC5A, C16orf74, LGR6, IGSF1, QPRT and CA14 as possible new
early markers in PCa. These genes may prevent the progression of the disease and
their expression should be studied in patients with different outcomes.Spanish Government PRE2019-089807
PI15/00914
RTI2018-098983-B-100Universidad de Granada/CBU